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FOREWORD 

This  is  the  third  issue  of  Notes  on  System  Theory.  The  pur¬ 
pose  of  these  notes  is  twofold.  First,  to  provide  an  auxiliary  pub¬ 
lication  medium*  for  short  contributions  by  the  students  and  faculty 
who  are  engaged  in  research  in  systems  and  related  areas.  Second, 
to  contribute  to  the  development  of  system  theory  as  a  basic  scien¬ 
tific  discipline. 
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A  PITFALL  IN  ADAPTIVE  SYSTEM  DESIGN 
A.R.  Bergen  and  W.  F*  Colescott 

Design  procedures  for  adaptive  systems  often  proceed  along  the 
following  general  lines:  first,  a  design  is  made  based  on  the  assump¬ 
tion  that  the  structure  and  all  parameters  of  the  system  under  study  are 
known;  second,  given  observations  of  the  system  as  it  evolves  in  time, 
a  method  is  devised  for  estimating  those  parameters  which  are  in  fact 
not  known  initially;  and  third,  these  estimates  are  then  used  in  the  ori¬ 
ginal  design  in  place  of  the  unknown  parameters. 

This  would  seem  to  be  a  reasonable  approach;  and  in  some 
cases  has  been  used  successfully.  However,  if  one  proceeds  blindly 
the  results  can  be  very  misleading.  The  following  example,  selected 
for  its  simplicity,  demonstrates  the  deceptive  nature  of  such  a  proce¬ 
dure,  which,  in  a  more  complicated  example,  would  not  be  as  obvious¬ 
ly  transparent. 

Consider  a  signal  {x  }  generated  by  the  discrete  (scalar)  system 


x  =  a  *  x  n  =  0, 1,  2,  3,  .  .  •  (1) 

n+1  n  n 


where  {a  }  and  x  are  unknown  parameters, 
n  o 

The  signal  is  contaminated  by  noise  and  we  observe  {y^} 

where 

y=x+u  n=  0,1,2, 3,...  (2) 

n  n  n 


The  {u  }  are  independent,  gaussian,  random  variables  with  zero 

n  * 
means.  The  problem  is  to  obtain  a  "good"  estimate  of  x_.  given 

the  observed  values  y  ,  y  ,  y  ,  .  .  .  ,  y  .  Following  the  procedure 


outlined  above: 


First:  Assume  that  the  actual  {a.}  are  known:  the  minimum  vari- 

-  ,  r 

* 

ance  unbiased  estimate  x.  of  x.  given  y  ,  y.j-.-.y  is 

J  J  ?  o  7  i  7n 


* 

x.  (P) 

J  - 


n 


Z  ''  ' 


I*: 


(3) 


where 


i-1 


■  ,TT 


k=0 


i=0 


ak 


P  =  (P  >  Pi  ’  °  *  *  *  )'  * 

—  o  1  n 


Second:  Estimate  the  unknown  a.  (or  equivalently  the  (3e)  by 
finding  the  (3^  (a^  )  which  minimize 
n 


Z  h  ■  xV(£*1] 

j=o 


(4) 


Third:  It  might  now  be  argued  (incorrectly)  that  as  n  increases 
the  estimates  of  the  (3.,(a^)  will  improve,  since  there  is  more  data 
upon  which  to  base  the  estimates.  Thus,  when  these  estimates  are  sub¬ 
stituted  for  the  (3^  in  (3),  an  estimate  of  x^  which  improves  as  n 
increases  is  obtained. 

However,  if  the  second  and  third  steps  for  this  example  are 
carried  out,  the  following  result  is  obtained 

(3  =  -i  and  x  (P  )  =  y  .  (5) 

J  yo  j  j 

The  estimate  for  x.  is  therefore  just  the  observed  y.  and  the 
J  3 

complex  estimating  machinery  used  above  is  not  needed.  Of  course,  due 

to  the  simplicity  of  the  example,  this  result  could  have  been  seen  from 

the  start.  y„  is  clearly  the  minimum  variance  unbiased  estimate  of  x.,  since 
3  j 

with  no  restriction  on  the  underlying  process  which  generates  {x  }, 

n 

knowledge  of  any  y.  (or  even  of  x.)  can  in  no  way  help  in  estimating 
J  J 

any  other  x.  (i  ^  j). 


CALCULATION  OF  CHANNEL  CAPACITY 


E.  Eisenberg 


The  problem  of  computing  the  capacity  of  a  discrete,  constant 
channel  is  the  following:  given  an  mxn  matrix  P=[p_],  where 
each  row  of  P  represents  a  probability  distribution,  to 


find 


maximizes 


y  =  (Yi*  ‘  •  •  »y„J  Which 
l  m 


F(y)  =  y.p.j  log 


i.  J 


m 


I 


y  p  * 

S^SJ 


(i) 


subject  to 


>0,  ^  y.  =  1. 


i=l 


In  order  to  avoid  inessential  arguments  we  assume  that  each  column  of 
P  has  at  least  one  positive  entry.  It  should  also  be  noted  that  the  loga¬ 
rithm  is  taken  with  respect  to  an  arbitrary  fixed  basis  b  >  1. 

That  the  problem  described  by  (1)  is  nontrivial  in  general  can 
be  seen  from  the  fact  that  in  [2,  p.  136]  it  is  stated  incorrectly  that 
y  is  a  solution  of  (1)  if,  and  only  if, 

n  n  pm- 

C=I  pij  108  pij  -  I  pij  108  X  V.J 


j=l 


j=l  S=1 

for  i  =  1,  .  .  .  ,  m 


(2) 


and  y  >  0,  y^  =  1  . 
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CP 


That  (2)  need  not  hold  for  any  optimizing  y  can  be  seen  by 

taking 


3  5_ 

8  8 


It  is  readily  checked  that  in  this  case  y  =  (1753)  ^  (1163,  590,  0)  is  the 
only  optimal  y  and  the  first  condition  in  (2)  is  violated  when  i  =  3  . 
Furthermore,  a  procedure  advocated  in  [2,  p.  139]  for  finding  a 
solution  of  (1)  which  is  based  on  (2)  may  yield  an  incorrect  answer. 
The  procedure  in  question  is  this:  solve  (2)  without  requiring  y  >  0 
(it  is  not  clear  that  a  solution  will  always  exist);  if  it  turns  out  that 
y  >  0,  then  we  have  a  desired  y.  The  difficulty  arises  in  case  some 
y^  turn  out  negative.  It  is  then  suggested  by  Fano  that  one  solve  the 
m  problems  generated  by  (2)  when  letting  one  y,.  =  0  at  a  time.  If  any 
of  the  y's  thus  obtained  satisfies  y  >  0  then  the  one  with  largest  F(y) 
is  accepted  as  a  solution  to  (1),  otherwise  set  two  of  the  y^rs  equal  to 
zero,  etc.  It  is  not  clear  whether  the  first  relation  in  (2)  for  that  y^ 
which  is  zero,  is  retained  or  not.  If  it  is  retained  one  is  certainly 
going  to  be  in  trouble  as  can  be  seen  from  the  example  cited  above.  If 
the  particular  relation  is  omitted  one  may  get  into  difficulties  too  in 
case  two  of  the  y.'s  must  be  zero  at  an  optimum.  This  will  be  the 
case  when: 


P  — 


1 

2 

l 

3 

3 

8 

3 

8 


1 

2 

2 

3 

5^ 

8 

5 

8 
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With  respect  to  (2)  the  correct  statement  is:  y  is  a  solution  of 

(1)  if,  and  only  if, 
m 


y  >  o.  X  yi = 

i=l 

m 

Z  yipy >  ° 


all  j  =  1, .  . .  ,  n 


i=l 


n  n  rm 

fm  -  Z  py ioe  py  ■  Z  pij ioe  z 

j=i  j=i  Ls=i 


y  p  • 

s  rsj 


(3a) 


(3b) 


,  all  i. 

(3c) 


If  one  multiplies  (3-c)  by  and  then  sums  over  all  i  one  obtains 
F(y)  >  F(y),  thus  equality  must  hold  in  (3-c)  whenever  y,  >  0,  but 
the  crucial  point  is  that  when  y.  =  0  (3-c)  must  still  hold,  though  it 

may  turn  out  to  be  a  strict  inequality#  v 

A  fundamental  tool  needed  to  establish  the  equivalence  of  (1) 
and  (3),  as  well  as  other  pertinent  results  is  the  following  well  known 
Lemma  1: 


Let  a.,b.  (j=l,  .  .  .  ,  n)  be  given  numbers  which  satisfy  a.  >0, 


1  J 


n 


J- 


b.>0,  all  j,  and  \  a.  >  0.  Then, 
J  *  J 

j=i 


z 

j=i 


a.  log 


I- 


2  I*, 


log 


j=l 


z^ 


(4) 


Furthermore,  equality  holds  in  (4)  if,  and  only  if,  there  exists  a 

number  X  such  that  a  .  =  \b.  for  all  j  =1,  .  .  .  ,  n.  One  of  the  im- 

J  J 

portant  consequences  of  Lemma  1  is  that  if  y  and  yr  both 
solve  (1).  Then,  even  though  y  need  not  equal  y1,  we  must 
have : 
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m  m 


i=l  i=l 


(5) 


We  now  turn  our  attention  to  the  problem  of  finding  a  solution  of  (1), 
or  of  (2.),  in  practice*  It  is  known  that  the  function  F(y),  in  (1),  is 
a  concave  function,  thus  (1),  its  constraints  being  linear,  is  a  con¬ 
vex  programming  problem.  However,  we  wish  to  make  use  of  a 
general  convex  programming  method  which  requires  that  all  functions 
in  question  possess  gradients  and  clearly  F  is  not  differentiable  at 
points  where  some  y^O.  It  is  possible,  though,  to  state  a  convex 
programming  problem  which  is,  in  a  sense,  equivalent  to  (1)  and 
which  enjoys  the  property  that  all  its  functions  are  differentiable.  The 
problem  is : 


find  a  and  o>  =  (to  oo  ) 

1  n 

which  maximize  a,  subject  to 
n 

a  >  )  p. .(-(*>.  +  log  p..),  i  =  1?  .  .  .  ,  m 

—  ij  J  1J 


(6) 


n 


Zw. 

e  1  <  1. 

j=l 

The  correspondence  between  (1)  and  (6)  is:  If  y  solves  (1),  then 
_m 

a  =  F(p),  oj  .  =  logy  y  p  .  solve  (6).  If  a,  oo  solve  (6),  then 

J  ^3=1  S 


CO 


let  x.  =  e 
J 


n 


I 

■  k=l 


and  any  solution  y  of 


-6 


(7) 


y  =  (y^ 


m 


)  >  0 


m 


i=l 


will  then  be  a  solution  of  (1),  The  important  point  is  that,  providing 
a,  co  solve  (6),  there  will  always  exist  a  solution  of  (7).  The  proof 
of  the  above  described  equivalence  between  (1)  and  (7)  is  not  difficult, 
it  requires  certain  duality  results  for  convex-homogeneous  programming 
[l].  The  interest  of  (6)  lies,  among  other  things,  in  the  fact  that  in  order 
to  solve  it  one  may  apply  the  "Cutting  plane  method"  for  solution  of  con¬ 
vex  programming  problems  which  is  described  in  [3,  Chapt.  6];  this 
method,  being  quite  general,  only  gaur ante es  that  a  solution  is  obtained 
as  a  limit  of  a  convergent  process.  Providing  the  convergence  rate  is 
reasonable  such  a  method  is  quite  satisfactory  from  the  practical  point 
of  view.  It  would  be  of  interest,  both  theoretical  and  practical,  to  have 
a  method  which  yields  a  solution  to  (1)  in  a  finite  number  of  steps.  We 
will  now  describe  such  a  method,  which  is  known  to  yield  a  solution  after 
a  finite  number  of  steps  when  n,  the  number  of  output  symbols,  is  two, 
it  also  may,  with  some  slight  modifications,  do  the  same  for  any  n; 
this  last  is  still  an  open  question,  and  one  of  the  purposes  of  this  note  is 
to  draw  attention  to  this  problem. 

The  alogorithm  is  as  follows:  take  any  x  =  (x^,  .  .  .  ,  x^)  such 
that  x.  >  0  all  j,  and  y  x.  =  1-  Next  calculate 


M 


mm 

i 


n 


n 


■  Z  py iog  pij +  Z  pij iog  xj 

>-  j=i  j=i 


(8) 


=  7” 


n 


© 


let 


I  =<i  M 


=  -  X  py 108  (e-)r  • 


The  next  step  is  to  solve  the  following  linear  programming  problem: 


find  y  =  (y^  ....  y^) 


which  maximizes  y  subject  to 


>  0,  y  y.p. .  <  x.  all  j  =  1,  .  .  .  ,  n,  y.  =  0  if  ic  I. 
-  i  ij  -  J  i 


Problem  (8)  always  has  a  solution  with  y  t  0  and  0  *  y.  <  1. 

i=l 

«  y,  ,  =  ls  then  yP  -  x  and  we  have  a  solution  to  (1).  Otherwise, 

m  m 

fi  =  y^  is  strictly  between  0  and  1,  and  y^p_  <  x*  ^or 

i=l  i=l 

some  j.  Let 


y  y.p. .  =  x. 
i  1J  J 


J  =  V 


=  y  x.,  cr,  =  y  p.. 

£—  J  1  1J 


111 

1  v 

max  —  )  y.p.. 

j<J  "j  %  113 
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X.^  =  max 
i«I.  s^1 

o*  <  c r. 
s  1 


X  =  max  (X^s  X^ 


M-d 

s 

exp - 

cr  —o' 

1  s 


where 


We  now  modify  x  according  to  the  formula, 


x! 

J 


-  x  [\  +  (1-X.)  9 ]  _1  if  j  €  J 

'  xA[  K  +  (l-\)0]  ifj^J 


Next  one  recalculates  (8)  with  the  new  x’,  then  proceeds  to  (9), 
(10),  etc.  This  process  will  yield  the  answer  (i.  e.  ,  a  maximum  of 
1  in  (10)  )  in  a  finite  number  of  steps  providing  n=2. 
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CONTROLLABILITY  OF  SYSTEMS  WITH  TIME  LAGS 

A,  Larsen 


Consider  a  linear,  time  invariant  system  describable  by  a 
set  of  n  linear  first  order  difference  “differential  equations  of  the 
form 

x  (t)  =  A  x(t-l)  +  Bu  ,  (1) 

A  is  nx n, 

_B  is  nx  r , 
t  >  1  , 

u  is  an  r  dimensional  control. 

Let  f  (t)  denote  the  initial  response  vector  representing  x(t)  during 
the  time  interval  0  <  t  <  1  and  assumed  continuous,  f  (t),  0  <  t  <  1, 
is  analogous  to  an  initial  condition  x(l)  for  a  system  without  lags. 

I.  Definition: 

The  system  (1)  is  controllable  if  and  only  if  for  all 
initial  response  vectors  f(t)  which  are  continuous  [0,1]  there 
exists  a  finite  time  T  >  1  and  a  control  vector  u(t)  over  [l,  T] 
such  that  x(T)  =  0. 

II.  Sufficient  Condition  for  Controllability  of  (1): 

The  system  (1)  is  controllable  prior  to  the  time  t  =  k"*" 
if  the  columns  of 


B 


Ak-Z 

A  B 


A  B 


B  ] 


(2) 


span  the  n  dimensional  x  space.  (Note  that  by  the  Caley -Hamilton 
Theorem  the  maximum  number  of  independent  columns  of  (2)  is 
determined  at  k  =  n.  ) 
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Proof;; 


Since  the  statement  is  only  a  sufficiency  condition  we  need 
only  concern  ourselves  with  the  terms  in  the  solution  x(t)  of  (1) 
depending  on  u(t).  If  for  some  T#  u(t)  over  the  interval  1  <  t  <  T 
can  be  so  chosen  such  that  the  terms  involving  u(t)  can  represent 
any  vector  in  the  x  space  ,  then  they  can  certainly  represent  the 
negative  of  the  vector  which  is  the  sum  of  the  remaining  terms  not 
involving  u(t),  leaving  x(T)  =  0* 

Equations  of  the  type  (1)  can  be  solved  recursively*  Over  the 
interval  1  <  t  <  2  we  have  from  (1), 


A  x  (t-1)  dT  +1  B  u(t)  dT  +  x(l) 


or 


x(t) 


\(t) 


u(t)  dT, 


where  k^(t)  represents  all  terms  in  the  solution  not  involving  u(t). 
Continuing 


x(t)  =  k2(t) 


+  A  B  |  u^(t-I)  dT  +  b|  u(t)  dT, 


2  <  t  <  3 


where  again  k^(t)  represents  terms  not  involving  u(t)  and  we 
define 


u(T-l)dT  . 


1 


Finally 


k 

x(t)  =  kk(t)  + 

j=l 


U(j"1)(T-j+l)dT, 


k  <  t  <  k+1, 
where  in  general 

The  input  with  arbitrary  coefficient  vectors  a. ,  .  .  .  ,  a. 

— I  — K 


u(t)  =  a  5(t-k)  +  a  6'(t-k+l)  + 

— J.  L 


+  a,  6 
—k 


k-1 


(t-i) 


(3) 


(4) 


where  6  is  the  unit  impulse,  6’  the  doublet,  etc.  ,  allows  the  coef¬ 
ficients  of  the  columns  of  (2)  to  be  chosen  arbitrarily.  With  the 
input  (4)  the  right-hand  term  in  (3)  becomes 

k-1  k-2 

A  Ba.  +A  B  a.  ,+•••+  B  cl,  • 

Thus,  if  the  columns  of  (2)  span  the  x  space  the  right-hand  expres¬ 
sion  in  (3)  can  represent  any  vector  in  this  space  through  a  suitable 
choice  of  coefficients  of  the  control  (4). 

Counterexample  to  Show  the  Condition  II  is  Not  Necessary: 

Consider  the  system  (5): 


x  (t)  =  -  c  x  (t-1);  c  >  1;  x(t)  =  f(t),  0  <  t  <  1. 


(5) 


The  following  can  be  shown: 
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(i)  All  the  roots  of  the  characteristic  equation  of  (5)  are  complex 
with  nonzero  imaginary  parts. 

Proof: 

The  expres sion  which  gives  the  roots  of  the  characteristic 
equation  is 

s  Ice  S  =  0  .  (6) 

Let  s  =  cr  +  jgo.  Then  (6)  becomes 

cr  +  ce  cos  go  =  0  (7a) 

go  -  ce  a  sin  go  =  0.  (7b) 

If  go  =  0,  then  (7a)  - >  tr  =  -  ce  ^  which  can  never  be  satisfied  since 

c  >  1. 

(ii)  Since  the  coefficients  of  (5)  are  all  real,.  all  roots  of  the  charac¬ 
teristic  equation  occur  in  complex  pairs. 

It  thus  follows  from  (i)  and  (ii)  that  all  terms  in  the  solu- 

crt 

tion  of  (5)  are  of  the  form  p(t)e  cos  (cot  +  cf>)  where  p(t)  is  a 

polynomial  in  t  depending  on  the  order  of  the  root.  (  p(t)  is  of  finite 

order  since  the  number  of  roots  of  an  analytic  function  with  real  parts 
on  any  finite  interval  of  the  real  axis  in  the  complex  s  plane  is 
finite.  See  reference  1.  ) 

(iii)  if  GOj  is  the  imaginary  part  of  a  root  with  real  part  cr^,  then 
any  other  root  with  real  part  must  have  an  imaginary  part  +  co^. 
Proof: 

Fix  o^.  If  equation"  (7a)  is  to  be  satisfied  for  some  other 
imaginary  part  go  ^  then  go  must  satisfy  go  ^  =  +Go^+2*Trn;  n  =  +l, 

+2,...  .  However,  this  satisfies  (7b)  only  if  go  ^  =  +  oo^. 

Let  0*^  and  +goq  denote  the  real  and  imaginary  parts  of  the 
roots  with  the  largest  real  part  having  a  corresponding  nonvanishing 
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( 7  t 

time  expression  in  the  solution  of  (5).  Further,  let  d  t^e  °  • 

P 

cos  (go  ^t  4-  <(>^)  be  the  term  in  this  expression  with  the  largest 

integer  p  such  that  the  constant  ^  0.  Then  as  t — >  oo,  the 

amplitude  of  oscillation  of  the  solution  of  (5)  is  given  by 
cr  t 
p  o 

+  d  t  e  .  Hence,  during  each  half  cycle*  for  sufficiently  large 

P 

finite  t,  x(t)  crosses  the  t  axis.  Thus,  x(t)  reaches  zero  of  its 
own  accord  and  the  system  is  controllable.  However,  II  is  not 
satisfied. 

Remarks : 

The  sufficiency  condition  II  is  very  similar  to  the  necessary 
and  sufficient  condition  for  the  controllability  of  the  analogous  sys¬ 
tem  without  delays, 

x  =  A  x(t)  +  IJu(t).  (8) 

A  necessary  and  sufficient  condition  for  the  controllability  of 
(8)  is  that  the  columns  of  (2)  span  the  x  space  with  k=n.  The 
difference  is  that  with  system  (1)  one  must  in  general  wait  until 
time  t  =  n  before  (2)  fills  out  to  its  maximum  rank,  whereas  in 
the  case  of  system  (8)  the  maximum  rank  is  attained  immediately. 

The  other  major  difference  between  (1)  and  (8)  regarding 
controllability  is  the  nonnecessity  of  II  if  the  system  has  lags.  This 
is  due  to  the  possibility  of  an  oscillation  in  a  simple  first-order  dif¬ 
ference-differential  equation  as  the  counterexample  shows. 
Reference : 

1,  Caratheodory,  C.  ,  Theory  of  Functions,  New  York: 

Chelsea  Publishing  Co.  *  1954  (Translation). 
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MACHINES  THAT  GENERATE  REGULAR  MARKOV  CHAINS 

D.  Moorehead 

\  Any  finite  state  machine  will,  by  virtue  of  the  Markov  charac- 

ter  of  the  "next  state"  function,  generate  a  finite  Markov  chain  when 
a  distribution  is  defined  over  the  input  alphabet.  Of  particular 
interest  is  that  class  of  machines  which  will  generate  a  "regular" 
Markov  chain.  +  Such  machines  have  the  property  that  the  probability 
distribution  vector  defined  over  the  states  converges  to  a  vector  con¬ 
stant  in  time.  Given  a  Moore  machine  model  we  may  identify  an 
output  symbol  with  a  subset  of  the  state  space.  This  in  turn  permits 
us  to  specify  a  distribution  vector  over  the  output  that  is  constant  in 
time. 

Suppose,  then,  a  Moore  machine  M  with  the  following  charac¬ 
terization:  an  input  alphabet  set 

U  —  {  a^,  i  •  •  •  >  0^}  1 

an  output  alphabet  set 

Y  =  {P1.  . Ps}, 

and  a  state  space 

S  =  {<r  ,  cr  ,  •  *  •  ,  o*  } 
to  n 

u^  =  input  at  time  t 

y  =  output  at  time  t . 

+A.  Gill,  ’’Synthesis  of  Probability  Transformers,  "  Journal  of  the 
Franklin  Institute  (July  1962). 
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{  a.j  >  >  •  •  •  >  Q-y } 


M 


•  •  •  .Pg} 


There  is  a  probability  distribution  vector  defined  over 


the  input  alphabet, 


IT  =  {tt  ,  TT  ,  .  .  .  ,  TT  };  TT  =  P{u  =  a.)  >  0. 
u  Y  c  XX  t  1 

The  sequence  of  inputs  are  independent  and  identically 
distributed. 

For  any  unordered  pair  of  states  (o\,  o\),  o\,  or.  e  S,  there 

*  J  i  J 

exists  a  subset  of  U  (possibly  empty)  say  U_,  such  that  o\  will 

be  mapped  into  <r.  under  U...  We  denote  this  by 
J 

U.. 

-  _  1J  ^  - 


Let  be  the  state  of  machine  S  at  time  t.  Then 


I 


TT 


a.  e  U.. 
i  iJ 

This  allows  us  to  construct  a  stochastic  matrix, 

P  =■  (P..). 
ij 

From  Markov  chain  theory  we  have  that  P  describes  a 
regular  Markov  chain  if  and  only  if  3n  such  that  pn  is  a  posi- 
tive  matrix.  Then,  lim  Pn  exists.  We  call  this  limit  T.  Further, 
it  has  the  property  tliat  alf°  its  row  vectors  are  the  same  vector  F: 


T  = 


Yf,  =  1.  F.  >  0 

cL.  i  i 

F.  =  P{s^  =  <r. }  . 
X  t  =  oo 
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The  vector  F  is  an  eigenvector  of  P  with  eigenvalue  1: 

FP  =  F. 

These  relations  have  the  implication  that  as  time  approaches  infinity 
the  probability  that  the  system  is  in  a  given  state  is  nonzero  and  inde¬ 
pendent  of  the  initial  state. 

With  each  state  is  associated  some  output  symbol  (3^.  Thus 

for  each  6.  there  exists  a  subset  of  states  such  that 

i  P. 


p(Vpi  8t 


1. 


Therefore, 

T  p{sr’i} 


^..co  '  «*i>  ‘  I  V 

TSf>. 

1 

We  may  speak  of  an  s  component  vector  q  over  the  output 

where 

“i  *  P(W  ?i>- 

From  convergence  considerations ,,  V  e  ,  €  >  0,  13  n  such  that, 


|p{yt  -  p<yt==ty|<« ; 

i.  e.  ,  if  we  observe  the  output  of  the  machine  of  the  aforementioned 
type  at  intervals  of  time  sufficiently  long  we  can  achieve  an  output 
alphabet  distribution  vector  as  near  constant  as  we  please. 

We  make  the  following  definition:  two  machines  A  and  B 
are  equivalent  in  the  regular  Markov  sense  iff  for  all  distributions 
over  the  input  alphabet  the  probability  of  the  output, being  any  mem¬ 
ber  of  the  output  alphabet, is  the  same  for  A  and  B.  Let  it  be  the 
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distribution  over  the  input  alphabet.  Regular  Markov  equivalence 

f  R.M. 

will  be  denoted  by  {A  •  B} .  Then  the  above  definition  becomes 


{A  B}  Y,  P{yf=Pi} 

=  p{ytB^)} 

A  B 

where  y  ‘  and  y^  are  the  outputs  from  A  and  respectively  at 
time  t. 

Theorem; 

If  two  deterministic  machines  in  the  regular  Markov 
class  are  equivalent,  then  they  are  R.M.  equivalent. 

Proof: 

Let  A  and  B  be  two  machines  in  the  regular  Markov  class  . 
that  are  equivalent.  Assume  that  the  initial  states  of  A  and  B  are 
equivalent.  Let  the  same  input  be  fed  to  both  machines.  Then  the  out¬ 
puts  will  be  identical.  Since  the  outputs  are  identical,  the  distributions 
over  the  outputs  are  the  same  and^hexxce,  must  tend  to  the  same  limit. 
Since  the  limiting  distributions  are  independent  of  the  initial  state  in  a 
regular  Markov  chain,  the  limiting  distribution  attained  in  this  way  is 
the  limiting  distribution  for  the  process.  Q.  E.  D. 

The  following  examples  of  two  machines*  A  and  B,  illustrate 
that  the  converse  of  this  theorem  does  not  hold,  i.  e.  ,  machines  can  ■ 
be  R.  M.  equivalent,  but  not  equivalent  .in  the  Moore  sense. 

Example: 

U  =  {0,1} 

Y  =  {  0, 1} 

S  =  {1,2}  . 
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1 


0 


1 


Let  P{ut  =  1,  \/t>  =  P» 
P{ut=0,  \ft}  =  1-P 
1  2 


1  /  p 

I-P] 

pa  =  2  y  i-p 

i 

2 

i  /  i-p 

P  ^ 

PB  =  2  \  P 

i-p/ 

and  are  the  stochastic  matrices  for  A  and 

A  B  B 

F  and  F  be  their  respective  fixed  vectors:  : 

II 

fbpb  =  FB 

fb  +  fb  =  1. 

A  solution  to  the  above  set  of  equations  yields 
A 


_A  B  B 

F  =  F  =  F 
2  1  2 


_1 

1  ‘2  ‘1  ‘2  2  ' 

Thus,  A  and  B  are  R.  M.  equivalent  but  Moore  distinguishable. 

A  word  is  in  order  regarding  the  conditions  on  a  machine 
such  that  it  falls  into  the  class  of  regular  Markov  machines.  A 
transliteration  of  the  regular  Markov  property  into  machine  structure 
provides  us  with  the  following: 
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Definition: 


A  machine  may  generate  a  regular  Markov  chain  iff  there 
exists  an  integer  k  such  that  any  state  is  reachable  from  any  other 
state  in  k  transitions. 

It  is  obvious  that  the  above  definition  implies  strong  connected¬ 
ness.  This  allows  us  when  dealing  with  machines  of  the  regular 
Markov  type  to  speak  of  equivalence  under  a  simple  experiment. 

A  less  stringent  and  possibly  more  useful  definition  of  equi- 
valence  is  one  demanding  equivalence  only  under  a  particular  input 
distribution.  We  propose  the  following. 

Definition; 

TWo  machines  are  R.M.  equivalent  under  iff  Vi 


p{ =  6  }  =  P{ y^  =  (3. } 
xyt=co  iyt=co  KiJ 


Denote  this  equivalence  by  {  A 


R.M. 


B}.  Then, 


{A  B} 


iVi  P(yf=oo  =  V  *  P(>W.= 


A  simple  test  for  equivalence  here  is  just  to  solve  the  homo¬ 
geneous  equations  for  each  machine, 


F  PA  =  F  , 
A 


fb  P 


B 


and  compare  output  probabilities.  This  definition  permits  one  to  draw 
from  a  much  larger  class  of  machines  and  provides  a  very  simple 
test  for  equivalence. 

Conditions  on  a  Machine  such  that  it  Generates  a  Transition  Matrix 
over  the  Output: 

Consider  the  partition  generated  by  the  requirement  that  each 
statQ  generate  the  same  output.  The  class  defined  for  each  becomes. 
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If  we  can  say  further  that 


P{st+1  €  SP.I  St  =  °p€  Sp.}  =P{st+l€  SpJ  V  V  VVV  V 


P, 


q  P/w  P’  q 


we  may  speak  of  the  stationary  transition  probability  from  set  S. 

J 

to  the  set  S., 

i 

P{W  spJ  vV  =  p{st+i'  sp.i  V  v  V 

J  J 


The  transition  matrix  P  on  the  output  is 


f 


Pi  P,  ...  Ps 


PY 


P.  V 


{p{wsp. 


V  v* 

j 


=  y  p ...  y s_ . 

ij  *  p, 


j£Sp 

j 

This  establishes  a  machine  (M1,  say)  with  s  states.  For 

o\,cr,  €  M',  we  make  the  correspondences 
^  J 


% 


v  v 


Select  U. .  =  {a,, ....  a.  }  1  <  r  subject  to  the  condition, 
ll  1  I  — 


X  P'v-i’  =p(Wse.lv  V’ 

a.€  U. .  1  3 

i  ij 

For  Vir,6  S.  ,  V  o-„  c  SQ  ,  U..  is  all  a.  such  that 

J  Pj  i  ij  i 


=  g(o'i>  Qj)  • 


Regular  Markov  Mealy  Machines: 

It  is  clear  that  the  state  structure  requirement  on  the  Mealy 

machine  for  regular  Markov  chain  generation  is  the  same  as  that 

for  the  Moore  machine.  The  procedure  for  computing  the  output 

alphabet  probabilities  differs  somewhat.  Consider  the  set  of  all  those 

states  Sft  which  could,  in  conjunction  with  an  appropriate  inputs  give 
.  Ki 

rise  to  p^.  £jet  the  set  of  inputs  that  produces  p^  for  given  state 

( r.  be  designated  tt.  .  Then 
K  kp . 


Pf^co  =  »i>  *  X  Fk”: 


kj 


Vsp. 

^kj*  Wkp; 


Since  there  exists  an  equivalent  Moore  machine  for  any 

-f- 

Mealy  machine,  all  evaluations  for  outputs  may  be  carried  out  for 
the  Moore  machine  equivalent.  The  above  theorem  assures  that  the 
output  probabilities  will  be  equal.  Hence,  the  Mealy  machine  case  does 
not  pose  a  distinctly  different  problem. 


+  A.  Gill,  Comparison  of  Finite-State  Models,  n  IRE  Trans,  on 
Circuit  Theory,  CT~7  (I960),  pp.  178=179. 
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ANOTHER  TOOK  AT  THE  SAMPLING  THEOREM  AND  ITS  EXTENSIONS'1" 

J.  F.  A.  Ormsby 


Introduction: 

The  sampling  principle  had  its  origins  in  the  analytic  inter¬ 
polation  function  theory  of  Euler  and  Lagrange.  Later  in  this  century 
further  extensions  and  refinements  were  carried  out  in  general  mathe¬ 
matical  treatments  by  a  number  of  investigators.  The  connection 
of  these  treatments  with  the  sampling  principle  in  signal  theory  as 
stated  by  Shannon  [1],  [2],  is  the  cardinal  series  given  by 


f(*>  -  .  y, 


The  properties  of  this  series  in  general  interpolation  theory  are 
discussed  for  example  by  J.  M.  and  E.  T.  Whittaker,  W.  L.  Ferrar, 
and  I.J.  Schoenberg. 

It  is  interesting  to  note  that  Cauchy  [3]  as  early  as  1841, 
verbally  suggested  the  possibility  of  an  extension  of  (1)  to  essen¬ 
tially  nonuniform  sampling.  In  this  connection  Yen  [4]  has  recently 
detailed  this  possibility  and  generalized  its  statement  to  a  group  of 
sampling  theorems  for  nonuniform  sampling  (in  time). 


This  note  is  a  somewhat  condensed  version  of  material  originally 
presented  as  a  term  paper  for  the  course  Elect.  Engr.  298  given  by 
Prof.  C.A.  DeSoer  at  U.  C.  in  the  Fall,  I960,  semester.  A  few  later 
references  have  been  added. 

++These  include  Ch.  J.  de  la  Vallee  Poussin,  E.  T.  and  J.  M.  Whittaker, 
W.A.  Jenkins,  T.  N.  E.  Grenville,  I.J.  Schoenberg,  W.  L.  Ferrar,  and 
N.  E.  Norlund.  The  book  by  J.  F.  Steffensen,  Interpolation ,  New  York: 
Chelsea,  1950,  summarizes  some  of  this  work. 
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Although  earlier  suggestions  of  the  implications  of  the 
sampling  principle  in  communication  theory  were  made  by  Nyquist 
and  Gabor,  its  emergence  as  a  widely  used  and  accepted  concept 
awaited  Shannon.  Then  its  prominence  coincided  in  time  with  the 
recognition  of  the  stochastic  estimation  concepts  as  advanced  by 
Wiener.  These  two  areas  find  a  mutual  application  for  example  in 
the  consideration  of  discrete  parameter  (time)  series.  An  interpre¬ 
tation  of  equation  (1)  in  a  stochastic  sense  was  treated  in  a  rigorous 
fashion  recently  by  Balakrishnan  [5]  showing  that  pointwise  limits 
carry  over  to  limits  in  the  mean.  + 

We  are  concerned  here  with  the  sampling  principle  and  its 
generalizations  in  the  nonstochastic  case  that  is  for  a  (single)  sure 
signal.  However,  the  stochastic  interpretation  (as  defined  in  [5]) 
of  such  results  is  direct. 

In  reviewing  recent  papers  on  the  sampling  theorem  and  its 
generalizations  in  the  information  theory  literature,  one  quickly 
becomes  aware  of  the  variety  of  approaches  used  for  analysis.  In¬ 
deed,  the  proofs  are  in  large  part  specialized  and  very  often  cum¬ 
bersome.  For  example,  one  approach  frequently  used  sets  up  spec-p 
tra  models  with  spectra  constraints  to  prevent  frequency  aliasing. 

The  satisfying  of  the  resulting  conditions  on  the  (time)  samples  then 
provides  the  desired  generalization. 

Our  purpose  here  will  be  to  exhibit  an  approach  or  at  least 
suggest  use  of  a  point  of  view  which  encompasses  all  these  generali¬ 
zations  and  suggests  additional  extensions.  Along  with  this  discus¬ 
sion,  our  note  will  also  serve  as  a  review  of  recent  contributions 
in  sampling  theory. 

^For  a  very  extensive  treatment  of  sampling  in  the  stochastic  case, 
see  S.  P.  Lloyde,  "A  Sampling  Theorem  for  Stationary  (Wide  Sense) 
Stochastic  Processes,  !f  Trans.  Amer.  Math.  Soc.  ,  Vol.  92  (July  1959), 
pp.  1-12  (also  as  Bell  System  Monograph  3433). 
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However,  before  entering  upon  this  it  seems  appropriate  to 
present  a  short  review  of  the  basic  sampling  theorem  to  place  the 
extensions  in  the  proper  setting. 

Review  of  the  Basic  Sampling  Theorem; 

The  basic  sampling  principle  (for  uniform  sampling)  can  be 
restated  from  (1)  for  comparison  with  more  general  forms  as 


oo 


m  =  ^ 


f(nAt) 


sin  7r/At  (t-nAt) 
tr/At  (t-nAt) 


(2) 


n=  -  oo 


where  At  is  the  sampling  (time)  interval.  For  At  =  l/2w,  equation 
(2)  gives  a  unique  error  free  representation  of  f(t)  when  the  spec¬ 
trum  of  f(t)  is  confined  to  the  frequency  interval  (-W,W). 

We  now  briefly  motivate  (2)  by  two  formal  approaches 
(a)  and  (b)  and  a  more  rigorous  treatment  (c)*  Questions  of  rigor 
in  (a)  and  (b)  enter  in  terms  of  interchanges  of  limits  (in  sums 
and  integrals).  However,  since  (a)  is  the  more  conventional  approach 
and  (b)  has  the  appeal  of  operator  manipulation,  we  include  them. 

-00 

,  ,  i 

(a)  For  F(w)=0,  |w  |  >  2irW,  F(w )  = 

-OO 


/ 


e  *W*f(t)dt. 


Expanding  F(« )  in  a  Fourier  series, 

V  iw  n/  2W  ,  , 

=  Z. ane  ;  Ml 


F(w ) 


=  0  ; 


with 


1  f 

~  4irW  J 


2ttW 


■2irW 


2irW 


w  >  2ttW 


4  icon/2W  J 
F(u> )  e  du>  . 
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From 


■if 


2irW 


F(w )  e 


icot 


f(t)  = 

&  TT  I 

•ZttW 

we  have  on  letting  t  =  n/2W, 
an  =  ±-  f(-n/2W). 

Taking  inverse  Fourier  transforms  on  the  series  expansion  of  F(u>)t 
we  get 


OO 

t7n  _  V  sin  ^(2Wt+n) 

1(t)-  2.  n  a(2Wt+n) 

oo 

■I 

/olir4  sin  ir(2Wt-n) 
f|n/2W)  ir(2Wt-n) 

n=  -oo 

n=-oo 

(b) 

F(f)  =  F(f)[T  rect  (f/2W)],  t 

=  1/2W, 

f  =  u>  /  2tt  ,  and 

oo 

F(f)=  £  ~F(f-  . 

n=»oo 


Taking  inverse  Fourier  transforms  ^  \  we  have  with  £fg(t)=F(f), 


g(t) 


3"  <  F(f)  [t  rect  (f/ 2oo )  ]  f  ^  *(F(f)  )  ]  sine  2-rrWtt 


Now 


oo 


/\ 


F(f) 


=  F(f)  *  T  ;  6(f.  $), 

C _  T  T 


-OO 


so  that 


oo 


^_1(F(f))  =  (^_1F(f))-  ^ 


m=-oo 


+ 


The  symbol  *  stands  for  the  convolution  operation. 


-26- 


We  have,  therefore, 


g(t)  = 


oo 


s(t>  z 


n 


6<t*^r> 


*  sine  2iTWt 


^oo 


00 


=  g(n/2W)  sinC2irW(t--^r] 


-oo 

(c)  This  approach  of  expanding  in  a  Fourier  series  is  due 

to  Parzen.  The  style  of  proof  given  here  is  that  found  in  Balakrishnan 
[5]  or  more  recently  Beutler  [6]. 

Again  using 

rw 

g(t)  =  /  el2lTftF(f)  df, 


we  expand 


-W 


oo 


i2Trft 


=  X  an(t|el 


irnrf/W 


f  e  (~W,  W) 


■oo 


with 


>„'•»  -  h  f 


W 


i2irft  -iiraf/W 
e  e  df 


-W 


_  sin  ir(2Wt-n) 
ir(2Wt-n)  ‘ 


Thu  8, 


g(t) 


.W 


-W 


N 

V  ...  inf/w\ 

I  lim  y  a  (t)  e  ) 

V  N— >oo  „  n 

-N 


F(f)df. 
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iZtrft  ^ ,p 

Since  e  is  absolutely  continuous  for  fe  (-W,W),  V  a  (t)e 

N  -N 

converges  boundealy.  Denoting  Y.T  =  >  ,  we  have  Y  — >  Y  with  Y 

In  *—  ^  N  N  I 

bounded  for  all  N. 

Using  the  zero  measure  of  the  singleton  {-W}  and  {W}  on  the 

f  line  to  give  /  =  /  and  applying  the  dominated 

Jf«  (~W,  W)  J  fe  [-W,  W] 

convergence  theorem  we  have 


N 


imrf/  W 


g(t) 


pw 

L 

(iim  Y  ) 
N 

F(f)  df  =  lim 
N 

rw 

/  N 
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N 

f 

( Z*»<*>*1 

J  -w 

-  N 
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df 


-W 


F(f)df 


N 


„W 


oo 


lim 

N 


t —  n 


inirf/W 


F(f)df  =  ^  g(n/2W)an 


-N 


-W 


-OO 


oo 


■I 


g(n/2W) 


sin  tr(2Wt-n) 
ir(2Wt-n) 


-oo 


Recent  Generalizations  of  the  Sampling  Theorem: 

Two  basic  types  of  generalizations  of  (2)  involve  considera¬ 
tions  on  what  samples  are  taken  and  when  the  samples  are  taken* 

We  call  these  Types  I  and  II.  For  identification  purposes  with 
respect  to  the  approach  of  the  next  section,  we  will  also  label  a 
Type  III. 

Type  I:  This  generalization  of  (2)  substitutes  more  sample 
values  for  fewer  sample  times  by  using  samples  not  only  of  the  func¬ 
tion  itself  but  also  its  derivatives*  Using  up  to  the  order  deriva¬ 
tive,  its  form  can  be  given  as 
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f(t)  = 


d*  f(t) 

dt1 


)jC+l 


t=nAt 


(3) 


where  At  =  (K+iy 2W  and  l(t)  is  bandlimited  to  («WrW)  as  in  (2). 

Shannon  [l]  first  remarked  about  the  possibility  of  such  a 
generalization.  Then  Fogel  [7]  supplied  a  spectrum  argument  justi¬ 
fying  the  conjecture  using  considerations  for  specifying  the  spectrum 
uniquely  in  terms  of  a  solution  to  a  system  of  equations.  The  approach, 
in  fact,  involved  inducing  spectrum  folding.  Later,  Fogel  and 
Jagerman  [8]  gave  a  statement  of  (3)  for  K=1  in  connection  with 
extending  (2)  for  t,  a  complex  variable.  More  recently,  Abramson 
and  Linden  [9]  gave  (3)  for  arbitrary  K.  However,  their  approach 
and  method  of  proof  is  cumbersome.  They  assume  the  result  and 
show  its  validity  by  proving  the  difference  between  the  two  sides  of 

(3)  is  zero.  The  proof  rests  on  showing  that  a  certain  function  is  iden¬ 
tically  zero  if  it  has  a  bandlimited  spectrum  and  if  its  first  K  deriva.- 
tives  are  zero  at  the  sample  points.  It  involves  the  detail  of  a 
Vandermonde  matrix  analysis. 

Type  II:  This  generalization  involves  taking  sample  values 
at  nonuniform  times  (or  at  least  not  entirely  uniform).  The  first 
particular  result  along  this  line  appears  to  be  the  work  of  Kohlenberg 
[10]  in  connection  with  bandpass  signal  spectra  The  resulting 

sample  times  have  a  high  degree  of  uniformity  in  this  case. 

\ 

Although  a  sampling  theorem,  if  it; exists,  could  be  written 
in  a  form  reflecting  the  particular  sampling  time  pattern  for  each 
case  considered,  a  general  principle  can  be  given  following  Yen 

[4] .  The  principle  is  stated  by  Yen  as  a  generalization  of  his 


Theorems  I  and  III.  Based  on  this,  its  validity  would  appear  restricted 
to  the  cases  covered  by  just  these  theorems.  The  principle  which  indeed 
is  an  extension  of  the  early  comment  of  Cauchy  [3]  can  be  given  as 
follows. 

If  a  signal  is  a  magnitude -time  function  and  if  time  is  divided  into 
equal  intervals  of  T  seconds  where  T=N/2W  and  N  instantaneous 
samples  are  taken  from  each  interval  in  any  manner,  then  a  knowledge 
of  the  magnitude  of  each  sample  and  the  instant  at  which  each  sample 
is  taken  determines  the  original  signal  uniquely.  If  the  number  of  sam¬ 
ples  is  less  than  that  stated  above  in  any  interval,  the  signal  becomes 
underspecified,  i.  e.  ,  additional  conditions  must  be  employed  before 
the  signal  is  determined  uniquely.  On  the  other  hand,  if  the  sample 
number  is  more  than  N  in  any  interval,  the  signal  becomes  overspeci¬ 
fied,  that  is,  the  sample  values  cannot  be  arbitrarily  assigned,  but 
must  satisfy  a  certain  number  of  consistency  conditions.  The  signal 
in  the  above  is  bandlimited  to  (-W,  W). 

The  approach  to  be  suggested  in  the  next  section  allows  Type 
II  generalization  to  follow  naturally  without  necessarily  tying  it  to  a 
particular  structure  such  as  in  Yen.  Further  the  approach  of 
Kohlenberg,  like  Fogel  for  Type  I,  contrived  the  required  system  of 
equations  on  the  spectrum.  In  reference  [10]  a  double  sequence  of 
sampling  points  (second  order  sampling)  is  introduced.  In  the  next 
section  an  attempt  is  made  to  eliminate  the  need  for  these  particular 
contrivances  by  presenting  a  unified  point  of  view. 

Type  III:  As  noted  above  Fogel  and  Jagerman  [8]  considered 
extending  the  basic  theorem  (2)  (and  (3)  for  K=l)  to  the  case  where 
t  is  complex.  The  justification  applying  contour  integration  is  indeed 
cumbersome.  Such  an  extension  and  the  contour  integral  equivalent 
appear  naturally  in  the  approach  to  be  discussed. 


In  closing  this  section,  a  few  additional  remarks  can  be  made 
to  fill  out  the  picture  of  extensions  in  sampling  theory.  Aside  from  the 
justification  of  (2)  in  the  stochastic  case  by  Balakrishnan  who  inci¬ 
dentally  considers  both  absolutely  and  nonabsolutely  continuous  spectra, 
other  extensions  of  sampling  methods  can  be  mentioned.  These  include 
random  time  sampling. 

If  the  sample  time  differences  are  not  independent  (statistically), 
then  errors  may  result.  Consider  for  example  the  case  where  the  average 
sample  rate  is  proper  (to  prevent  nonarbitrariness  or  folding)  but  there 
exists  a  random  jitter.  The  jitter  at  successive  times  of  sampling  is 
taken  independent  but  the  difference  in  sample  times  is  not  indepen¬ 
dent.  Then  the  resulting  power  spectrum  is  attenuated  by  a  factor 
which  is  the  square  of  the  characteristic  function  of  the  sample  time  jit¬ 
ter  component  distribution  [11],  [12]. 

Recently  Shapiro  and  Silverman  [13],  treating  the  case  of  inde¬ 
pendent  sample  time  differences,  produced  sampling  without  spectrum 
folding  (aliasing)  when  no  restrictions  are  placed  on  the  sampling  inter¬ 
val  in  terms  of  the  bandwidth  limitation.  The  sample  time  difference  in 
this  case  followed  a  Poisson  distribution.  This  result  hinges  on  the 
characteristic  function  of  the  sample  difference  times  being  single 
valued.  The  close  connection  of  this  Fourier  transform  (characteristic 
function)  and  the  Fourier  transform  of  the  quantity  <|>  (X.)  *n  the  extended 
fundamental  form  (see  equation  (7)  )  discussed  in  the  next  section  is  to 
be  noted.1  Schoenberg  [14]  takes  the  Fourier  transform  of  the  Lagrange 
coefficient  4>^( \) ,  calling  this  the  characteristic  function  and  estab¬ 
lishes  a  differentiation  between  smoothing  (interpolation  error)  and 
exact  interpolatioit  on  the  basis  of  the  range  and  values  taken  by  this 
characteristic  function. 

These  connections  to  <^(M  are  no*  surprising  since  after  all, 
the  basic  ingredients  must  lie  in  the  ^(\)  or  the  distribution  of  the 
(see  next  section)  since  the  rest  of  the  fundamental  form  extended 
(see  equation  (7) )  involves  th6  arbitrary  function  f(X). 
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Finally  we  may  consider  sampling  theorems  for  multidimensional 
signals.  For  example,  Kailath  [15]  "has  specified  sampling  formulae 
applied  to  a  function  of  two  variables  with  frequency  restrictions  on  one 
or  both  of  these  variables.  He  chose  for  convenience  to  use  the  Hilbert 
transform  form  as  used  by  Woodward  [16].  The  convenience  but  lack  of 
direct  interpretation  of  this  approach  can  be  illustrated  for  example  by 
comparing  the  difficulty  in  obtaining  forms  of  Kohlenberg  [10]  com- 
pletely  defined  in  time  and  signal  values  with  those  of  Woodward  based 
on  Hilbert  transforms. 

In  dealing  with  multidimensional  (multivariable)  sampling,  one 
could  also  seek  theorems  reflecting  the  correlation  between  the  compo¬ 
nents  (variables).  A  general  interpolation  form  in  more  than  one  vari¬ 
able  as  a  further  extension  of  formula  (7)  of  the  next  section  should  then 
be  able  to  encompass  sampling  in  multidimensional  cases  in  the  natural 
way  that  (7)  does  for  one  dimensional  sampling. 

Another  Approach  to  Sampling  Theorems: 

Considering  the  great  variety  and  detail  in  approach  together 
with  the  arbitrariness  and  cumbersome  nature  of  the  proofs  as 
indicated  in  the  last  section  for  the  various  theorems,  it  could  be 
hoped  that  another  more  general  point  of  view  might  exist  which 
would  encompass  the  various  generalizations  and  by  its  generality 
reduce  the  need  for  specialized  detail.  The  following  approach  seems 
to  offer  a  solution. 

From  general  interpolation  theory  and  related  results*  on 

*The  material  of  interest  here  is  part  of  the  spectral  theory  of  linear 
operators.  The  area  of  formulae  of  specific  concern  involving  functions 
of  matrices  was  presented  as  part  of  the  seminar  course  of  Prof.  Desoer 
already  noted.  This  material  is  also  found  in  reference  [17]  especially 
sections  18  and  19. 
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interpolating  polynomials  and  their  connection  with  the  minimum  and 
characteristic  polynomials,  there  emerges  an  approach  which  encom¬ 
passes  the  basic  theorem  (2)  and,  for  example,  its  Type  I,  II,  and 
III  extensions. 


It  will  not  be  the  intent  of  this  note  to  justify  the  steps  taken 
to  extend  the  interpretation  of  the  fundamental  formulae  given  below 
together  with  its  equivalents  and  needed  properties  in  order  to  associ¬ 
ate  with  the  requirements  of  the  sampling  theory.  However,  the  ap¬ 
proach  will  be  outlined  in  some  detail  and  examples  of  its  application 
to  obtaining  Type  I,  II,  and  III  generalizations  will  be  noted.  In 
addition,  its  use  on  other  generalizations^and  its  potential  to  extend 


even  further  will  be  discussed. 


Our  starting  point  is  the  fundamental  formulae  given  by 


*>■1  i 


m  -1 

(A  -  xkL) 


'<V*k 


k=l  i  =0 


where  (i)  X  is,  in  general,  a  complex  variable 

(ii)  A  is  some  matrix  (or  equivalently,  a  linear  transforma¬ 
tion  in  finite  dimensional  space)  with  specified  spectrum 
(eigenvalues)  {  \  ,  \7,  .  .  .  ,  X  }. 

(iii)  f  is  some  function  analytic  on  an  open  set  containing 

<Vi8 

(iv)  is  the  multiplicity  of  k  =  1, .  .  .  ,  s. 

(V)  E,  =  <j>  (X.)  =  ^  n  (X.)  with  n,(X.)  a  polynomial 


of  order  m  -1  in  X.  and  X,  and  ^  (M  the  minimum  poly- 

X  1C 

nomial.  has  the  range  of  the  generalized  null  space 
associated  with  X^  and  in  fact  is  the  projection  operator 
_ into  this  null  space. 

See  the  previous  footnote. 
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We  first  take  A - >  X  to  get 

I  St — -  *k<M 

k=l  i  =0  \  i 


From  the  theory  we  may  also  write,  after  taking  A 

rn  -1 
s  k 


X  X  < » =  sr  /  ^ 


k=l  i  =0 


where  C  encloses  the  spectrum  inside  the  domain  of  analyticity  of 

f  and  p(  X)  =  — .  Finally  from  the  general  structure  after  again 
o'  -  X 

taking  A  — >  X, 


f(X)  =  0  fv  7(Xk)  =  0  i  =  0,1*.  ..  ;m^-l  (7) 

k  =  1,  2,  .  .  .  ,  s 

With  these  results,  we  now  proceed  to  apply  this  theory  to  the 
sampling  problem.  The  basic  theorem  (2)  as  well  as  Type  I,  II,  and 
III  generalizations  can  proceed  directly  on  a  formal  basis  by  an  ex¬ 
tension  of  (6). 

s  - 1 

We  next  renumber  the  X.  so  that  — r-  =  N  for  s  odd  and 

k  2 

s 

y-1  =  N  for  s  even  so  that  with  N - >  oo,  we  write 


rn  -1 
k 


X  X 

k=-oo  k=0 


f(i  Vk) 


+k<x>-zkf>  ToTd(r 


where  C  is  some  contour  taken  as  a  limit  of  CL_  as  N  — >  oo, 

°°  N  N 

with  enclosing  the  spectrum  {X^} 

To  have  a  closer  connection  with  the  nomenclature  appearing 

in  sampling  theory  literature,  we  let  £  =cr  and  z  =  X.  In  applying  (7) 
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f 


% 


for  the  basic  sampling  theorem,  we  take  m  =  1,  for  all  k  corre- 

IV 

sponding  to  first  order  sampling  where  only  one  sample  value  is 
taken  at  each  sample  time.  The  association  between  eigenvalues  and 
complex  sample  times  is  made.  Before  exhibiting  the  form  of  ^(z), 
we  note  the  following  properties  of  4>^(z)  which  make  the  represen¬ 
tations  unique  for  each  spectrum  {z  }  chosen,  as  a  result  of  a 

XV 

requirement  of  the  minimum  degree  polynomial: 

w  • i!  ♦>?  V  -  °> 1  *»-2 . “k.i 

<t>k  \z  )  =  0;  i  =  1,2,  ... 

j  -  1,  2,  1  •  •  ,  k"l,  k+lf  *  •  •  ,  s 
=  -N,  .  .  ♦  ,  k-lf  k+1,  .  .  .  ,  N 
with  N - >  00  in  our  case. 


With  =  1,  we  have, 

00 

*(*■)  =  ^  f(zk)4>k(z)  • 

-OO 


Now 


00 


00 


= 


TT  <z-zj' 

TT  “-v 

j  =  -oo 

-OO 

00  . 

00  , 

TT  (vv 

<*-*k>TT  (vv 

j  =  -oo 

-OO 

sin  tr/hz 

(z-kh)(sin  Tr/hz)' 

for  z .  =  jh,  h, 
J 

z  =  kh 


complex 


it!  means  j  £  k,  in  6.  (z). 
J  k 
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sin  ir/h  (z  ~kh) 
ir/h  (z-kh) 


1 


( -l)k  sin  Tr/hz 
7r/h  (z-kh) 


Therefore, 

f(z) 


oo 

^  f(kh) 

“OO 


sin  ir/h  (z-kh) 
it/h  (z-kh) 


which  is  (2)  form. 


In  view  of  the  later  discussion  concerning  Type  III  generaliza¬ 
tion,  we  note  that  this  result  is  immediately  equal  to 


1 

2iri 


m 


d? 


where  C  encloses  {V  as  N - >  oo. 

oo  k  -N 

The  application  of  (7)  to  Type  I  generalization  carries  over 
in  same  way.  We  illustrate  for  m  =  2  (i.  e.  ,  i  =0,1)  to  give 

K 

repeated  sample  points.  Then  <j>  (z)  becomes  the  square  of  its  value 

K  (1) 

in  the  basic  theorem.  Also  with  I  =0,1  both  f(z  )  and  f  (z  )  ap- 

pear.  We  have  at  once  using  ^(z) - **  [sin  (ir/h  (z-kh)]  as 

N  - >  oo, 


«•)  *  £  [f(fch)  +  (»-kh)f|1)(kh)] 


-oo 


which  is  (3)  form  for  K=l.  The  form  for  the  K  order  derivative 
case  as  given  in  (3)  is  just  as  simply  obtained. 

As  noted  previously,  the  detailed  proof  of  Abramson  and 
Linden  [9]  involved  showing  that  a  function  is  identically  zero  if 
it  has  a  bandlimited  frequency  spectrum  and" If  its  first  K  derivatives 
See  footnote  on  previous  page. 


-36- 


are  zero  at  the  sample  points.  Such  a  result  appears  direct  from  the 
general  interpolation  structure  as  given  by  (7).  The  bandlimited  pro- 
perty  is  related  to  the  analyticity  of  f  and  the  spacing  between  suc¬ 
cessive  z  ,  specifying  h. 

iC 

We  now  turn  to  Type  II  generalization.  Since  the  problem  of 
unique  determination  for  the  various  nonuniform  sample  point  distri¬ 
butions  can  be  answered  in  a  natural  way  using  the  general  inclusive 
form  (7),  we  need  only  turn  to  the  particular  expression  which  this 
form  produces  for  a  particular  case.  We  give  as  an  example  the  case 
of  Theorem  I  of  Yen  [4]  (this  case  together  with  his  Theorem  III 
constituted  the  area  of  application  given  by  reference  [4]  for  Type  II 
as  noted  earlier).  The  development  in  [4]  is  tedious  and  indirect 
lacking  the  natural  development  of  the  form  of  the  sampling  formula 
using  the  point  of  view  suggested  by  (7). 

Treating  the  real  time  case,  we  take  z=t  and  consider  N 

sample  times  altered  from  an  otherwise  uniform  sampling  with 

h  =  1/2  W.  The  uniform  sample  times  are  at  n/2W,  while  the  altered 

times  are  at  t  ,  p  =  1,  2,  .  .  ♦  ,  N.  The  holes  left  by  the  altered  times 
P 

can  be  given  as  using  the  notation  of  Yen.  To  correspond  more 

closely  to  his  notation,  we  take  k=m.  Now  for  all  the  nonuniform 
sampling  situations  treated  by  Yen  and  Kohlenberg,  our  m^=l. 

We  then  have  from  (7), 
oo 

m=-oo 

where 

“tt  }  is  the  di  (t)  of  Yen  and 

tt  e  (  )  —  m 

t  =  n/2W  for  a  uniform  sample  time 
m 

=  t  for  a  nonuniform  sample  time  . 

P 
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The  expression  on  the  right-hand  side  is  the  general  form  for  this  type 

of  sampling.  The  \±t  can  be  rewritten  in  a  form  more  analogous  to 
— m 

the  form  of  equation  (2)  as  given  by  Yen.  We  do  this  in  cases  (a)  and 
(b)  below. 


Case  (a)  let  t  =  t 
m  p 


oo 


N 


Now, 


od,  oo  „  n  TT'^w'TT  '‘"V 

TT  =TT  «*-  TT  <-v  *  -  n  ^ 


-OO 


-oo 


q=i 

q^p 


TT 

q=l 


Again, 


oo 


|Tp  “  2W  > 


N 


oo  , 


00. 


N 


IT  "m-v  =TT  <y&>TT  <yv 


TT'vht'TT  vv 

-oo _ q=i,  q^p _ 


N 


-OO 


-00 


q=i 

q^  p 


TT  <y  2w* 

q=I 


Then, 


CO, 

TT,,-‘n) 

-00 

N  N 

TT  '‘-v  TT  <v  w  > 

q=l,q/p  q=l 

00 

TT(t-#> 

-00 

00 

N  N 

OD 

TT  <t  -t ) 

limn 

TT(t-^»TT,vv 

TT  vht 

-00. 

q=l  q=l 

-oo 

q^p  — 

sin  2irWt 
sin  2ttWt 

P 
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Case  (b)  let  t  =  n/2W  ^  n  /2W 
n  q 


oo 


N 


N 


IT 


-OO 


TK»  TN 

q=l 


n 


2W  2W 


) 


q=l 


tr<#  -y  TT(t-i¥»TT(^ 


2W  V 


-  CD 


q=l 


q=l 


( »l)n  sin  2TrWt 
7r(2Wt~n) 


As  noted  earlier,  what  we  call  Type  III  generalization  has  been 
considered  by  Fogel  and  Jagerman  [8]  for  the  forms  (2)  and  (3) 
with  K  =  L  Most  of  their  proof  using  certain  boundedness  conditions 
on  f(z)  involves  showing  that  the  contour  integral 


£  _ m _ 

J'  (5-?)  sin(ir/h)| 


d? 


>  0  as  n 


oo. 


c 

n 

The  detail  is  rather  formidable  and  the  motivation  is  carried  out  in 
a  somewhat  heuristic  way.  The  devised  result  then  comes  from 
residue  evaluation.  Indeed,  the  appropriate  restriction  on  f(z)  for 
using  this  approach  of  the  Cauchy  integral  can  be  found  in  the 
work  of  Paley  and  Wiener  [18]  and  Levinson  [19]  • 

Thus,  the  possibility  of  proceeding  in  general  to  form  (7)  from 
(6)  and  of  avoiding  particular  contour  integration  appears  preferable. 
Aside  from  the  possible  further  extensions  of  this  approach  to  a  multi- 
variable  situation,  as  previously  noted,  we  close  this  section  by 
raising  some  questions  and  commenting  on  them  in  a  way  vdiich  may 
provide  clarification  and  extension  concerning  topics  either  hinted 
at  or  neglected  in  the  literature. 
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Question  1:  Can  the  K+l  derivatives  (if  they  exist)  taken  at 
a  K/ 2W  spacing  be  chosen  from  any  order  of  derivatives  (as  Fogel 
seems  to  suggest)?  For  example,  can  we  take  as  the  K+l  deri¬ 
vatives  f^\  f^\  f^\  .  .  .  ,  an<^  if  so,  what  form  does  the 

interpolating  formula  take? 


Question  2:  The  usual  form  (3)  places  the  function  and  its 
derivative  values  at  successive  points  with  fixed  and  uniform 
spacing  K/2W,  i.  e.  ,  f(jK/2W),  f^(jK/2W),  .  .  .  ,  f^k \jK/ 2W), 

at  each  j.  Is  it  possible  to  specify  these  values  in  any  other  way 
such  as 

...,  f(jK/2W),  f((j+l)K/2W),... 

f(15(j(K+l)/2W),  f(1)(  (j+l)(K+l)/2W),  .  .  . 

....  f(k)(j(2K-l)/2W),  f(k)((j+l)(2K»l)/2W),... 
or  (K  even), 

f(0\  f'k'  at  jK/2W,  (j+l)K/2W, .  .  . 

f(1),  at  j(K+l)/2W,  (j+l)(K+l)/2W,  .  .  .  ,  etc.  ? 

Question  3:  Considering  the  sampling  generalizations  Type  I  and  II 
is  it  possible  to  combine  these,  that  is,  is  it  possible  to  introduce 
derivative  values  into  nonuniform  sampling? 

Formally,  at  least, 'the  general  form  of  (7)  appears  to  provide  the 
answers  to  the  above  questions  as  follows: 


Answer  1: 


If  f^  ^  appears  then  so  does  f-^,  ,  f^ 


so  that  if  K+l  derivatives  are  to  be  specified  this  implies  they 


are  f 


(0)  ,(1) 


f(k) 
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so  also  are 


(l  ) 

Answer  Zt  Again  if  f  (z)  is  taken  at  z=z  , 

f<°>,  -1). 

Answer  3:  Considering  the  possibility  of  having  >  1  and  at 

the  same  time  choosing  the  z^  to  satisfy  some  nonuniform  sample 
time  pattern,  it  would  appear,  at  least  formally,  that  a  combination 
of  Types  I  and  II  generalizations  could  be  developed  (with  the 
taking  on  their  corresponding  forms)  into  a  more  general  sampling 
theorem  than  has  been  previously  proposed. 
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A  NOTE  ON  FANO'S  INEQUALITY 
T.J.  Wagner 

Let  X  and  Y  be  two  finite -valued  random  variables  for 
which  we  try  to  deduce  X  from  the  occurrence  of  Y  by  any  func¬ 
tion  g:  range  of  Y  to  range  of  X.  Fano's  inequality*  states 
that  the  probability  of  error  P[g(Y)  ^X]  is  related  to  the  condi~ 
tional  uncertainty  I(X/  Y)  by 

I(X/Y)  <  h(P[g( Y)  ^  X]  )  +  P[g(Y)  ^  X]  log2  [|X|  -1] 

where 

(i)  I(X/Y)  =  ^  -P(x,y)log2 

{(x,  y)  :P(x,  y)>0} 

{-  X  log  x  -  (1  -x)  log  (1-x)  0  <  X  <  1 

0  x  =  0, 1 


and  (iii) 

|X|  denotes  the  number  of  points  x  in  the  range  of  X  with 
P(x)  >  0. 

We  now  prove  a  statement  which  can  serve  as  the  other  half  of 
Fano's  inequality. 

Assertion: 

If  a  function  g  satisfies 

(*)  P(g(y)/y)  >  P(x/y)  for  all  x,  y, 

then  , 

P[g(Y)  *X]  <  I(X/Y). 

*R.  M.  Fano,  Transmission  of  Information,  New  York:  John  Wiley  and 
Sons,  Inc.  and  the  M.  I.  T.  Press,  1961,  p.  187. 
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Proof: 


t 


r 


i 


7 


I(x/Y)  =  -P(x,  y)  log 

{(x,  y)!  P(x,  y)>0} 

>  P(x,  y){-log2  P(x/y)}. 

{(x,  y):P(x,  y)  >  0,  x  i  g(y)} 

For  each  x,  y  such  that  x?£g(y)  and  P(x,  y)  >  0,  -  log2  P(x/y)  > 1 

since  P(x/y)  <  P(g(y)/y)  and  x^g(y)  ==>  P(x/y)  <  1/2,  Therefore, 

X  PU  »y){~log2  P(x/y)}  >  ^  P(x,  y) 

{(x,y):P(x,y)  >  0,  x  t  g(y)}  {(x,  y):  P(x,  y)  >  0,  x  *  g(y)} 


=  P[g(Y)  ^  X]  .  Q.  E.  D. 


Remark: 


*  or  a  finite  -valued  random  variable 
finite -valued  random  variables  Y^,  •  •  m 


X  and  a  sequence 
let 


of 


{gn;  range  of  (Y^  .  .  .  ,  Y  )  to  range  of  X} 

be  any  sequence  of  functions  where  g  ,  for  each  n  ,  satisfies 

n 

(*)  with  Y  replaced  by  (Y^,  •  •  •  ,  Y  ).  It  is  straightforward  to  show 

from  the  assertion  and  Fano's  inequality  that  I0C/Y., .  .  .  ,  Y  )  is 

1  n 

asymptotically  equal  to  P[g  (Y, . . .  ,  Y  )  *  X],  that  is, 

n  1  n  J 

lim  inf  log2  P[gn(Yr  ....  Yn)*  Xj 


=  lim  inf 
n 


1 


n 


!og?  ICX/Y. , .  .  .  ,  Y  )  • 
^  1  n 
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